Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model

نویسندگان

  • Miaomiao Wen
  • Miaomiao Wang
  • Keikichi Hirose
  • Nobuaki Minematsu
چکیده

In this paper, tone nucleus model is employed to represent and convert F0 contour for synthesizing an emotional Mandarin speech from a neutral speech. Compared with previous prosody transforming methods, the proposed method 1) only converts the tone nucleus part of each syllable rather than the whole F0 contour to avoid the data sparseness problems; 2) builds mapping functions for well-chosen tone nucleus model parameters to better capture Mandarin tonal information. Using only a modest amount of training data, the perceptual accuracy achieved by our method was shown to be comparable to that obtained by a professional speaker.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emotional Voice Conversion for Mandarin using Tone Nucleus Model – Small Corpus and High Efficiency

The GMM-based spectral conversion techniques were applied to emotion conversion but it was found that spectral transformation alone is not sufficient for conveying the required target emotion. In this paper, we adopt the tone nucleus model to carry the most important information of tones and represent F0 contour for Mandarin speech. And then tone nucleus part is converted to emotional speech fr...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Tone Generation by Maximizing Joint Likelihood of Syllabic HMMs for Mandarin Speech Synthesis

A tone generation method by maximizing the joint likelihood of syllabic HMMs is proposed to improve the Mandarin speech synthesis. F0 sequence is generated by jointly maximizing the likelihood of the state-level F0 model and syllable-level tone model under the constraint of mean F0 of the adjacent units. The optimal weight of the tone component is searched in terms of the parameter generation e...

متن کامل

GMM-based voice conversion applied to emotional speech synthesis

Voice conversion method is applied to synthesizing emotional speech from standard reading (neutral) speech. Pairs of neutral speech and emotional speech are used for conversion rule training. The conversion adopts GMM (Gaussian Mixture Model) with DFW (Dynamic Frequency Warping). We also adopt STRAIGHT, the high-quality speech analysis-synthesis algorithm. As conversion target emotions, (Hot) a...

متن کامل

Novel eigenpitch-based prosody model for text-to-speech synthesis

Prosody is an inherent supra-segmental feature in speech that human speakers employ to express, for example, attitude, emotion, intent and attention. In textto-speech (TTS) systems, high naturalness can only be achieved if the prosody of the output is appropriate. The importance of prosody is even more crucial for tonal languages, such as Mandarin Chinese, in which the tone of each syllable is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011